15 research outputs found

    A modelling framework for detecting and leveraging node-level information in Bayesian network inference

    Full text link
    Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modelling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximisation algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.Comment: 54 pages, 10 figures, 5 table

    Efficient inference for genetic association studies with multiple outcomes

    Full text link
    Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modelling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson et al. (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses. As Markov chain Monte Carlo (MCMC) inference on such models can be prohibitively slow when the number of genetic variants exceeds a few thousand, we propose a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced computational cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and tens to hundreds of clinical or molecular outcomes

    Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data

    Get PDF
    Genetic association studies have become increasingly important in understanding the molecular bases of complex human traits. The specific analysis of intermediate molecular traits, via quantitative trait locus (QTL) studies, has recently received much attention, prompted by the advance of high-throughput technologies for quantifying gene, protein and metabolite levels. Of great interest is the detection of weak trans-regulatory effects between a genetic variant and a distal gene product. In particular, hotspot genetic variants, which remotely control the levels of many molecular outcomes, may initiate decisive functional mechanisms underlying disease endpoints. This thesis proposes a Bayesian hierarchical approach for joint analysis of QTL data on a genome-wide scale. We consider a series of parallel sparse regressions combined in a hierarchical manner to flexibly accommodate high-dimensional responses (molecular levels) and predictors (genetic variants), and we present new methods for large-scale inference. Existing approaches have limitations. Conventional marginal screening does not account for local dependencies and association patterns common to multiple outcomes and genetic variants, whereas joint modelling approaches are restricted to relatively small datasets by computational constraints. Our novel framework allows information-sharing across outcomes and variants, thereby enhancing the detection of weak trans and hotspot effects, and implements tailored variational inference procedures that allow simultaneous analysis of data for an entire QTL study, comprising hundreds of thousands of predictors, and thousands of responses and samples. The present work also describes extensions to leverage spatial and functional information on the genetic variants, for example, using predictor-level covariates such as epigenomic marks. Moreover, we augment variational inference with simulated annealing and parallel expectation-maximisation schemes in order to enhance exploration of highly multimodal spaces and allow efficient empirical Bayes estimation. Our methods, publicly available as packages implemented in R and C++, are extensively assessed in realistic simulations. Their advantages are illustrated in several QTL applications, including a large-scale proteomic QTL study on two clinical cohorts that highlights novel candidate biomarkers for metabolic disorders

    Des chiffres et des lettres: refonte du plan de classement du CEDOC-CEC André-Chavanne

    Get PDF
    Nous avons été mandatées par le Centre de documentation du Collège et Ecole de commerce André-Chavanne à Genève (CEDOC) pour procéder à la refonte de son plan de classement, basé sur la CDU employée très finement. Mme Véronique Debellemanière, notre mandante et responsable du CEDOC, s’est posé la question de la pertinence, pour une bibliothèque du post-obligatoire, d’un système si développé qu’il engendre l’utilisation de cotes longues et compliquées. Nos objectifs, pour la réalisation de ce mandat, consistaient donc globalement à rendre le fonds plus accessible et à homogénéiser le plan de classement. Pour atteindre nos buts, nous avons passé par plusieurs étapes, qui nous ont permis de choisir le type de plan de classement à appliquer pour procéder enfin à la phase de réalisation de notre travail

    EPISPOT: An epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies.

    Get PDF
    We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both cis and trans actions, including QTL hotspot effects. It effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits with hypothesis-free selection of biologically interpretable annotations which directly contribute to the QTL effects. This unified, epigenome-aided learning boosts statistical power and sheds light on the regulatory basis of the uncovered hits; EPISPOT therefore marks an essential step toward improving the challenging detection and functional interpretation of trans-acting genetic variants and hotspots. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and finds other signals, as well as plausible mechanisms of action. In particular, by highlighting the role of monocyte DNase-I sensitivity sites from >150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of major hotspots close to the lysozyme gene. Our approach forgoes the daunting and underpowered task of one-annotation-at-a-time enrichment analyses for prioritizing cis and trans QTL hits and is tailored to any transcriptomic, proteomic, or metabolomic QTL problem. By enabling principled epigenome-driven QTL mapping transcriptome-wide, EPISPOT helps progress toward a better functional understanding of genetic regulation

    A fully joint Bayesian quantitative trait locus mapping of human protein abundance in plasma.

    Get PDF
    Molecular quantitative trait locus (QTL) analyses are increasingly popular to explore the genetic architecture of complex traits, but existing studies do not leverage shared regulatory patterns and suffer from a large multiplicity burden, which hampers the detection of weak signals such as trans associations. Here, we present a fully multivariate proteomic QTL (pQTL) analysis performed with our recently proposed Bayesian method LOCUS on data from two clinical cohorts, with plasma protein levels quantified by mass-spectrometry and aptamer-based assays. Our two-stage study identifies 136 pQTL associations in the first cohort, of which >80% replicate in the second independent cohort and have significant enrichment with functional genomic elements and disease risk loci. Moreover, 78% of the pQTLs whose protein abundance was quantified by both proteomic techniques are confirmed across assays. Our thorough comparisons with standard univariate QTL mapping on (1) these data and (2) synthetic data emulating the real data show how LOCUS borrows strength across correlated protein levels and markers on a genome-wide scale to effectively increase statistical power. Notably, 15% of the pQTLs uncovered by LOCUS would be missed by the univariate approach, including several trans and pleiotropic hits with successful independent validation. Finally, the analysis of extensive clinical data from the two cohorts indicates that the genetically-driven proteins identified by LOCUS are enriched in associations with low-grade inflammation, insulin resistance and dyslipidemia and might therefore act as endophenotypes for metabolic diseases. While considerations on the clinical role of the pQTLs are beyond the scope of our work, these findings generate useful hypotheses to be explored in future research; all results are accessible online from our searchable database. Thanks to its efficient variational Bayes implementation, LOCUS can analyze jointly thousands of traits and millions of markers. Its applicability goes beyond pQTL studies, opening new perspectives for large-scale genome-wide association and QTL analyses. Diet, Obesity and Genes (DiOGenes) trial registration number: NCT00390637

    Longitudinal analysis reveals that delayed bystander CD8+ T cell activation and early immune pathology distinguish severe COVID-19 from mild disease.

    Get PDF
    The kinetics of the immune changes in COVID-19 across severity groups have not been rigorously assessed. Using immunophenotyping, RNA sequencing and serum cytokine analysis, we analyzed serial samples from 207 SARS-CoV2-infected individuals with a range of disease severities over 12 weeks from symptom onset. An early robust bystander CD8+ T cell immune response, without systemic inflammation, characterized asymptomatic or mild disease. Hospitalized individuals had delayed bystander responses and systemic inflammation that was already evident near symptom onset, indicating that immunopathology may be inevitable in some individuals. Viral load did not correlate with this early pathological response, but did correlate with subsequent disease severity. Immune recovery is complex, with profound persistent cellular abnormalities in severe disease correlating with altered inflammatory responses, with signatures associated with increased oxidative phosphorylation replacing those driven by cytokines tumor necrosis factor (TNF) and interleukin (IL)- 6. These late immunometabolic and immune defects may have clinical implication

    Estimations d’erreur a posteriori pour problèmes de contrôle optimal décrits par des équations aux dérivées partielles

    No full text
    Ce projet s'intéresse au traitement général des problèmes de contrôle optimal pour les équations aux dérivées partielles et donne un aperçu d'estimations d'erreur a posteriori qu'il est possible d'appliquer dans ce cadre. Une illustration des concepts en jeu est proposée à travers la discussion de deux exemples numériques et de leurs variantes. L'implémentation est réalisée avec le logiciel FreeFEM++ (consacré à la méthode des éléments finis)

    Simulated data emulating real mQTL data

    No full text
    Simulated data emulating real metabolite quantitative trait locus data. SNPs are generated under Hardy–Weinberg equilibrium (binomial distribution) using the sample minor allele frequencies and spatial correlation structure of more than 200k real SNPs. 250 metabolic outcomes are simulated with block-wise dependence from Gaussian distributions. A small proportion of SNPs is associated to the metabolic outcomes; some outcomes are under pleiotropic control

    Determination of dansylated polyamines in red blood cells by liquid chromatography-tandem mass spectrometry.

    No full text
    International audienceThe concentration of polyamines in red blood cells (RBCs) is considered to be an index of cell proliferation. This index has been demonstrated to be of clinical importance for the follow-up and treatment of some cancer patients. The concentration of polyamines in RBCs is usually determined by high-performance liquid chromatography (HPLC) with fluorescence detection. In the current work, we present a liquid chromatography-tandem mass spectrometry (LC-MS/MS) method for the quantification of putrescine, spermidine, and spermine, the three major polyamines in RBCs. The polyamines were dansylated and analyzed by an LC gradient of 20-min duration on a C18 column on-line with a tandem mass spectrometer. An internal standard (1,8-diaminooctane) was used for quantification. This method exhibited excellent linearity for the three polyamines with regression coefficients higher than 0.99. The limits of detection for putrescine, spermidine, and spermine were 0.10, 0.75, and 0.50 pmol/ml, respectively. The intrarun precision values for putrescine, spermidine, and spermine all were better than 10%, and the interrun precision values were 13%, 9%, and 20%, respectively. The LC-MS/MS method is sufficiently simple and reliable enough to replace the currently used HPLC method with fluorescence detection in which putrescine is not always detectable
    corecore